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Abstract 

Resolving a conjecture of Abbe, Bandeira and Hall, the authors have recently shown that 
the semidefinite programming (SDP) relaxation of the maximum likelihood estimator achieves 
the sharp threshold for exactly recovering the community structure under the binary stochastic 
block model of two equal-sized clusters. The same was shown for the case of a single cluster and 
outliers. Extending the proof techniques, in this paper it is shown that SDP relaxations also 
achieve the sharp recovery threshold in the following cases: (1) Binary stochastic block model 
with two clusters of sizes proportional to network size but not necessarily equal; (2) Stochastic 
block model with a fixed number of equal-sized clusters; (3) Binary censored block model with 
the background graph being Erdos-Renyi. Furthermore, a sufficient condition is given for an 
SDP procedure to achieve exact recovery for the general case of a fixed number of clusters 
plus outliers. These results demonstrate the versatility of SDP relaxation as a simple, general 
purpose, computationally feasible methodology for community detection. 


1 Introduction 

The stochastic block model (SBM) [27], also known as the planted partition model [15], is a pop¬ 
ular statistical model for studying the community detection and graph partitioning problem (see, 
e.g., [30, 16, 35, 33, 29, 13, 14, 3] and the references therein). In its simple form, it assumes that out 
of a total of n vertices, (K± + • • • + K r ) of them are partitioned into r clusters with sizes K ±,..., I \ r , 
and the remaining n— (K i + - • ■ + K r ) vertices do not belong to any clusters (called outlier vertices); 
a random graph G is then generated based on the cluster structure, where each pair of vertices is 
connected independently with probability p if they are in the same cluster or q otherwise. In this 
paper, we focus on the problem of exactly recovering the clusters (up to a permutation of cluster 
indices) based on the graph G. 

In the setting of two equal-sized clusters or a single cluster plus outlier vertices, recently it has 
been shown in [24] that the semidefinite programming (SDP) relaxation of the maximum likelihood 
(ML) estimator achieves the optimal recovery threshold with high probability, in the asymptotic 
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regime of p = alogn/n and q = b log njn for fixed constants a, b and cluster sizes growing linearly 
in n as n -> oo. The result for two equal-sized clusters was originally conjectured in [2] and another 
resolution was recently given in [10] independently. 

In this paper, we extend the optimality of SDP to the following three cases, while still assuming 
p = alogn/n and q = blogn/n with a > b > 0: 

• Stochastic block model with two asymmetric clusters: the first cluster consists of K vertices 
and the second cluster consists of n — K vertices with K = [pn\ for some p € [0,1/2], The 
value of p may be known or unknown to the recovery procedure. 

• Stochastic block model with r clusters of equal size K: r > 2 is a fixed integer and n = rK. 

• Censored block model with two clusters: given an Erdos-Renyi random graph G r\j 0(n,p), 
each edge (i,j) has a label L t j € {±1} independently drawn according to the distribution: 

PL ij\ a i' a *i ~ ^ e ) 1 {Ly=o-?c7*} + e ^{Lij=-cr*a*}' 

where a* = 1 if vertex i is in the first cluster and a* = — 1 otherwise; e € [0,1/2] is a fixed 
constant. 1 

In all three cases, we show that a necessary condition for the maximum likelihood (ML) esti¬ 
mator to succeed coincides with a sufficient condition for the correctness of the SDP procedure, 
thereby establishing both the optimal recovery threshold and the optimality of the SDP relax¬ 
ation. The proof techniques in this paper are similar to those in [24]; however, the construction 
and validation of dual certificates for the success of SDP are more challenging especially in the 
multiple-cluster case. Notably, we resolve an open problem raised in [1, Section 6] about the opti¬ 
mal recovery threshold in the censored block model and show that the optimal recovery threshold 
can be achieved in polynomial-time via SDP. 

To further investigate the applicability of SDP procedures for community detection, we ex¬ 
plored two cases for which the algorithm is adaptive to the unknown cluster sizes. First, we found 
that for two clusters, the conditions for exact recovery are the strongest in the equal-sized case. 
This suggests, and it is shown in Section 2.2, that if the cluster size constraint is replaced by an 
appropriate Lagrangian term not depending on the cluster size, exact recovery is achieved for all 
cluster sizes under the condition required for two equal-sized clusters. Secondly, we examined the 
general community detection problem with a fixed number of unequal-sized clusters and with out¬ 
lier vertices, and identified a sufficient condition for the SDP procedure to achieve exact recovery 
with knowledge of only the smallest cluster size and the parameters a, b. (See Section 5.) 

The optimality result of SDP has recently been extended to the cases with o(log n) number of 
equal-sized clusters in [4] and a fixed number of clusters with unequal sizes in [37]. 

Parallel independent work The exact recovery problem in the logarithmic sparsity regime has 
been independently studied in [3] in a more general setting: Given a fixed r x r matrix Q and a 
probability vector s = (si,..., s r ), the cardinality of the k th community is assumed to be and 
vertices in the k th and £ th community are connected independently with probability Qkilogn/n. 
The optimal recovery threshold is obtained as a function of Q and s. In the special setting of 
Qki = a if k = £ and Q^i = b if k ^ •£, for two asymmetric clusters or multiple equal-sized clusters, 
their general optimal threshold reduces to those derived in this paper. Assuming full knowledge 

1 Under the censored block model, the graph itself does not contain any information about the underlying clusters 
and we are interested in recovering the clusters by observing the graph and edge labels. 
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of the parameters Q and s, the optimal recovery threshold is further shown in [3] to be achievable 
in o(n l+e ) time for all e > 0 via a two-phase procedure, consisting of a partial recovery algorithm 
followed by a cleanup step. 

For the case of r equal-sized clusters, it is independently shown in [44] that the optimal recovery 
threshold can be obtained in polynomial-time. Their clustering algorithm is a two-step procedure 
similar to [3], where the partial recovery is achieved via a simple spectral algorithm. For the case 
with two unequal-sized clusters, a sufficient (but not tight) recovery condition is also derived in 

[44]. 

Further literature on SDP for cluster recovery There has been a recent surge of interest in 
analyzing the semidefinite programming relaxation approach for cluster recovery; some of the latest 
development are summarized below. For different recovery approaches such as spectral methods, 
we refer the reader to [14, 3] for details. 

The SDP approach is mostly analyzed in the regime where the average degrees scale as logn, 
with the objective of exact cluster recovery. In this setting, the analysis often relies on the standard 
technique of dual witnesses, which amounts to constructing the dual variables so that the desired 
KKT conditions are satisfied for the primal variable corresponding to the true clusters. The SDP 
has been applied to recover cliques or densest subgraphs in [6, 7, 5]. For the stochastic block model 
with possibly unbounded number of clusters, a sufficient condition for an SDP procedure to achieve 
exact recovery is obtained in [14], which improves the sufficient conditions in [13, 36] in terms of 
scaling. Various formulations of SDP for cluster recovery are discussed in [8]. The robustness of the 
SDP has been investigated in [18] for minimum bisection in the semirandom model with monotone 
adversary and, more recently, in [11] for generalized SBM with arbitrary outlier vertices. The SDP 
machinery has also been applied to recover clusters with partially observed graphs [12, 41] and 
binary matrices [43]. In the converse direction, necessary conditions for the success of particular 
SDPs are obtained in [42, 14]. In contrast to the previous work mentioned above where the constants 
are often loose, the recent line of work initiated by [2, 1], and followed by [24, 10] and the current 
paper, focus on establishing necessary and sufficient conditions in the special case of a fixed number 
of clusters with sharp constants, attained via SDP relaxations. 

In the sparse graph case with bounded average degree, exact recovery is provably impossible and 
instead the goal is to achieve partial recovery, namely, to correctly cluster all but a small fraction of 
vertices. Using Grothendieck’s inequality, a sufficient condition for SDP to achieve partial recovery 
is obtained in [21]; the technique is extended to the labeled stochastic block model in [28]. In [32], 
an SDP-based test is applied to distinguish the binary symmetric stochastic block model versus the 
Erdos-Renyi random graph and shown to attain the optimal detection threshold. 

Notation Denote the identity matrix by I, the all-one matrix by J and the all-one vector by 1. 
We write X V 0 if X is symmetric and positive semidefinite and X > 0 if all the entries of X 
are non-negative. Let S n denote the set of all n x n symmetric matrices. For X € S n , let A 2 (V) 
denote its second smallest eigenvalue. For any matrix Y, let ||y|| denote its spectral norm. For 
any positive integer n, let [n] = {l,...,n}. For any set T C [n], let |T| denote its cardinality 
and T c denote its complement. For p £ [0,1], let p = 1 — p. We use standard big O notations, 
e.g., for any sequences {a n } and {6 n }, a n = 0(6 n ) if there is an absolute constant c > 0 such that 
1/c < a n /b n < c; a n = D(6 n ) or b n = 0(a n ) if there exists an absolute constant c > 0 such that 
a n /b n > c. Let Bern(p) denote the Bernoulli distribution with mean p and Binom(?r,p) denote the 
binomial distribution with n trials and success probability p. All logarithms are natural and we 
use the convention 0 log 0 = 0. 
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2 Binary asymmetric SBM 


2.1 Known cluster size 


Let A denote the adjacency matrix of the graph, and (Cf,C£) denote the underlying true partition, 
where the clusters C\ and have cardinalities K and n — K, respectively, and we consider the 
asymptotic regime K = \np\ as n —>• oo for p E [0,7*] fixed. In this subsection we assume that 
p is known to the recovery procedure and the goal is to obtain the p-dependent optimal recovery 
threshold attained by SDP relaxations. 

The cluster structure under the binary stochastic block model can be represented by a vector 
a E {±l} n such that d t = 1 if vertex i is in the first cluster and (Jj = —1 otherwise. Let u* 
correspond to the true clusters. Then the ML estimator of a* for the case a > b can be simply 
stated as 

max E Aijdidj 
i,j 

s.t. di E {±1}, i E [n] 

d T 1 = 2 1\ — n, (1) 

which maximizes the number of in-cluster edges minus the number of out-cluster edges subject 
to the cluster size constraint. If K = n/ 2, (1) reduces to the minimum graph bisection problem 
which is NP-hard in the worst case. Due to the computational intractability of the ML estimator, 
next we turn to its convex relaxation. Let Y = dd T . Then Yu = 1 is equivalent to di = ±1, and 
d T 1 = ±(2 K — n) if and only if {Y. J) = (2 1\ — n ) 2 . Therefore, (1) can be recast as 2 

max ( A , Y) 

s.t. rank(y) = 1 

Yu = 1, i E [n] 

(J, Y) = (2K — ra) 2 . (2) 


Notice that any feasible solution is a rank-one positive semidefinite matrix. Relaxing this condition 
by dropping the rank-one restriction, we obtain the following convex relaxation of (2), which is a 
semidefinite program: 

Xsdp = arg max ( A , Y) 

Y 

s.t. Y 

Yu 1, i E [n] 

(J,Y) = (2I<-n) 2 . (3) 


We note that the only model parameter needed by the estimator (3) is the cluster size K. 

Let Y* = d*(d*) T correspond to the true partition and y n = {dd T : d E {±l} n , u T l = 2K — n} 
denote the set of all admissible partitions. The following result establishes the optimality of the 
SDP procedure. 

Theorem 1. If r)(p, a, b) > 1, then miny* e y n P{Ysdp = T*} > 1 — as n —>• oo, where 


V(p, a ,b) 


a + b ( p-p)r p (7 + (p-p)r) 


2 Hcnceforth, all matrix variables in the optimization are symmetric. 


( 4 ) 
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with p = 1 - p, r = log l_i ogb , 7 = V(P~ P) 2r 2 + ^PPab, and p(0,a,b) = lim P ^ 0 p(p,a,b) = 

a+b _ T j Q g e\/ab 

The proof of Theorem 1 is similar in outline to the proof given in [24], but a considerable 
detour is needed to handle the imbalance. Notice that by definition, p{p,a,b) = r](p, a, b), and 
r/(l/2 ,a,b) = ^(Va — Vb) 2 . The threshold function p(p,a,b ) turns out to be the error exponent in 
the following large deviation events. For vertex i, let e(i, Cf) denotes the number of edges between 
vertex i and vertices in C*, and define e(i,Cl) similarly. Then, 

P{e(i, C*) - e(i, Cl) < r{p - p) log n} = n -^ p ’ a ' b)+o(1 \ Vi € C*, 

P {e(i, C* 2 ) — e(i, Cl) < r (p — p) log n} = n~ v ^ p,a ’ b ^ + °^ , Vi € C%. 

Next we prove a converse for Theorem 1 which shows that the recovery threshold achieved by 
the SDP relaxation is in fact optimal. 

Theorem 2. If r/(p, a,b) < 1 and a* is uniformly chosen over {a £ {±1}™ : u T 1 = 2K — n}, then 
for any sequence of estimators Y n , P{T n = Y*} —>• 0. 

In the special case with two equal-sized clusters, we have K = n/2 and 77 ( 1 / 2 , a, b) = \{y/a— 
\/b) 2 . The corresponding threshold (-/a — Vb) 2 > 2 has been established in [2, 34], and the 
achievability by SDP has been shown in [24] and independently by [10] later. 

A recent work [44] also studies the exact recovery problem in the unbalanced case and provides 
the sufficient (but not tight) recovery condition for a polynomial-time two-step procedure based on 
the spectral method. 

2.2 Unknown cluster size 

Theorem 4 shows that if one knows the relative cluster size p, the SDP relaxation (3) achieves 
the size-dependent optimal threshold p(p,a,b) > 1. For fixed a and b, rj(p,a,b) is minimized at 
P = \ (see Appendix A for a proof). This suggests that for two communities the equal-sized case 
is the most difficult to cluster. Indeed, the next result proves that if there is no constraint on 
the cluster size, then the optimal recovery threshold coincides with that in the balanced case, i.e., 
(y/a — Vb) 2 > 2, which can be achieved by a penalized SDP. 

Theorem 3. Let 

y s ' DP = arg max (A, Y) - A*(J, Y) 

Y 

s.t. Y y 0 

Yu = 1, i € [n]. (5) 

where X* = and r = log °I b logb - If(Va~Vb ) 2 > 2, then min r « ey ; P{U S ' DP = Y*} > 1 
as n —>• 00 , where y' n = {aa T : a £ {±l} n }. 

Remark 1. Theorem 3 holds for all cluster sizes K, including the extreme case where the entire 
network forms a single cluster (K = 0), in which case the SDP (5) outputs Y* = J with high 
probability. The downside is that the penalization parameter A* depends on the parameters a 
and b. Nevertheless, there exists a fully data-driven choice of A* based on the degree distribution 
of the network, so that Theorem 3 continues to hold whenever the cluster sizes scale linearly, 
i.e., K/n —» p £ (0, ^]; the price to pay for adaptivity is that the probability of error vanishes 
poly logarithmically instead of polynomially as n —» 00 . See Appendix B for details. 
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3 SBM with multiple equal-sized clusters 

The cluster structure under the stochastic block model with r clusters of equal size K can be 
represented by r binary vectors £ 1 ,..., € {0, l} n , where is the indicator function of the cluster 

k, such that £&(*) = 1 if vertex i is in cluster k and = 0 otherwise. Let £*,...,£* correspond 
to the true clusters and let A denote the adjacency matrix. Then the maximum likelihood (ML) 
estimator of £* for the case a > b can be simply stated as 

r 

max SxE £fc(*)£fc O') 
i,j k =1 

s.t. & g {0, i} n , ke[r] 

4 T 1 = k, ke[r] 

€k€k' = o, k^=k', (6) 

which maximizes the number of in-cluster edges. Alternatively, one can encode the cluster structure 
from the vertices’ perspective. Each vertex i is associated with a vector Xj which is allowed to be 
one of the r vectors v\,V 2 , - ■ ■ ,v r € R r_1 defined as follows: Take an equilateral simplex X r in 
M r ^ 1 with vertices ...,v r such that Xfc=i v k = 0 and ||ufc|| = 1 for 1 < k < r. Notice that 

( v k- v k') = — l/( r — 1) f° r k k'■ Therefore, the ML estimator given in (6) can be recast as 

max E Aij(xi, Xj) 

s.t. Xi £ {v i,V2,...,v r }, i£[n} 

5> = °. (7) 

i 

When r = 2, the above program includes the NP-hard minimum graph bisection problem as 
a special case. Let us consider its convex relaxation similar to the SDP relaxation studied by 
Goemans and Williamson [20] for MAX CUT and by Frieze and Jerrum [19] for MAX L-CUT and 
MAX BISECTION. To obtain an SDP relaxation, we replace x% by yt which is allowed to be any 
unit vector in R n under the constraint ( Vi,Vj) > — l/(r — 1) and Xi Vi = 0- Defining Y € R nxn 
such that Yij = ( yi,yj ), we obtain an SDP: 

max (A, Y) 

s.t. Y X 0, 

Yu = 1, i G [n] 

Yij > —1/ (r — 1), i,j£[n\ 

Y1 = 0 . ( 8 ) 

We remark that we could as well have worked with the constraint (Y, J) = 0, which, for Y X 0, is 
equivalent to the last constraint in (8). Letting Z = ’-^-Y + ^J, we can also equivalently rewrite 
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( 8 ) as 

Z SD p = argmax (A, Z) 
z 

s.t. Z y 0, 

Za = 1, i G [n] 

Zij > 0, i, j E [n] 

Z1 = ATI. (9) 

The only model parameter needed by the estimator (9) is the cluster size K. Let Z* = 
correspond to the true clusters and define 

Zn,r = { jZtek ■ & € {0, I}", £l = A; = 0, /C f k' 

lfc=l 

The sufficient condition for the success of SDP in (9) is given as follows. 

Theorem 4. If yfa — Vb > yfr, then min z*ez n , r IF-^sdp = Z*} > 1 — n - ^ 1 ) as n —>• oo. 

The following result establishes the optimality of the SDP procedure. 

Theorem 5. If yfa — Vb < yfr and the clusters are uniformly chosen at random among all r-equal- 
sized partitions of [n], then for any sequence of estimators Z n , P{Z n = Z*} —»• 0 as n —> oo. 

The optimal recovery threshold yfa — Vb = yfr is also obtained by two parallel independent 
works [44, 3] via a polynomial-time two-step procedure, consisting of a partial recovery algorithm 
followed by a cleanup stage. The previous work [14] studies the stochastic block model in a much 
more general setting with r clusters of equal size K plus outlier vertices, where r, K and the edge 
probabilities p, q may scale with n arbitrarily as long as rK < n; it is shown that an SDP achieves 
exact recovery with high probability provided that 

K 2 (p ~ q) 2 > C (Ap(l — q ) logn + q( 1 — q)n) ( 10 ) 

for some universal constant C. In the special setting where the network consists of a fixed number 
of clusters without outliers andp = ologn/n > q = 61ogn/n, the sufficient condition ( 10 ) simplifies 
to yfa — Vb > C' yfr for some absolute constant C", which is off by a constant factor compared to 
the sharp sufficient condition yfa — Vb > yfr given by Theorem 4. 

It is straightforward to extend the current proof of Theorem 5 to the regime where r = 7 log s n, 
p = al ° 8 ?i + n > q = 61og n + n f° r any fixed 7 , 0,6 > 0 and s € [0,1), showing that SDP achieves 
the optimal recovery threshold yfa — Vb = yfy. Indeed, the preprint [4] shows similar optimality 
results of SDP for r = o(logn) number of equal-sized clusters. Conversely, it has been recently 
proved in [25] that SDP relaxations cease to be optimal for logarithmically many communities in 
the sense that SDP is constantwise suboptimal when r > Clogn for a large enough constant C 
and orderwise suboptimal when r = cu(logn). 

4 Binary censored block model 

Under the binary censored block model, with possibly unequal cluster sizes, the cluster structure 
can be represented by a vector a £ {±l} n such that <r,; = 1 if vertex i is in the first cluster and 
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tjj = — 1 if vertex i is in the second cluster. Let a* G {±l} n correspond to the true clusters. Let 
A denote the weighted adjacency matrix such that Aij = 0 if i , j are not connected by an edge; 
A t] = 1 if i, j are connected by an edge with label +1: A t] = —1 if i, j are connected by an edge 
with label —1. Then the ML estimator of a* can be simply stated as 

max E Aij did j 

s.t. di G {±1}, i € [n], (11) 

which maximizes the number of in-cluster +1 edges minus that of in-cluster —1 edges, or equiva¬ 
lently, maximizes the number of cross-cluster —1 edges minus that of cross-cluster +1 edges. The 
NP-hard max-cut problem can be reduced to (11) by simply labeling all the edges in the input 
graph as —1 edges, and thus (11) is computationally intractable in the worst case. Instead, we 
consider the SDP studied in [1] obtained by convex relaxation. Let Y = dd T . Then Y„ = 1 is 
equivalent to di = ±1. Therefore, ( 6 ) can be recast as 

max ( A , Y) 

Y 

s.t. rank(T) = 1 

Yu = 1, i G [n\. (12) 

Replacing the rank-one constraint by positive semidefiniteness, we obtain the following convex 
relaxation of (12), which is an SDP: 

?sdp = argmax (A,Y) 

Y 

s.t. Y Y 0 

Yu = 1, i G [n\. (13) 

We remark that (13) does not rely on any knowledge of the model parameters. Let Y* = d*(d*) T 
and y n = {dd T : d G {±l} n }. The following result establishes the success condition of the SDP 
procedure in the scaling regime p = alogn/n for a fixed constant a: 

Theorem 6. If a{y/ 1 — e — y/e ) 2 > 1, then miny» e y n P{Ysdp = Y*} > 1 — as n —>• oo. 

Next we prove a converse for Theorem 6 which shows that the recovery threshold achieved by 
the SDP relaxation is in fact optimal. 

Theorem 7. If a(y/l — e— y/e) 2 < 1 and d* is uniformly chosen from {±l} n , then for any sequence 
of estimators Y n , P{Y n = Y*} —> 0 as n —>• oo. 

Theorem 7 still holds if the cluster sizes are proportional to n and known to the estimators, 
i.e., the prior distribution of d* is uniform over {d G {±l} n : cr T l = 2 K — n} for K = [pn\ with 

P G (0, !/ 2 ]- 

Denote by o*(e) the optimal recovery threshold, namely, the inhmum of a > 0 such that exact 
cluster recovery is possible with probability converging to one as n —» oo. Our results show that 
for all e G [0,1/2], the optimal recovery threshold is given by 

1 


a e 




(14) 






and can be achieved by the SDP relaxations. The optimal recovery threshold is insensitive to p , 
which is in contrast to what we have seen for the binary stochastic block model. 

Exact cluster recovery in the censored block model is previously studied in [1] and it is shown 
that if e —> 1/2, the maximum likelihood estimator achieves the optimal recovery threshold a(l — 
2e ) 2 > 2 + o(l), while an SDP relaxation of the ML estimator succeeds if a(l — 2e ) 2 > 4 + o(l). The 
optimal recovery threshold for any fixed e E (0,1/2) and whether it can be achieved in polynomial¬ 
time were previously unknown. Theorem 6 and Theorem 7 together show that the SDP relaxation 
achieves the optimal recovery threshold a(\/l — e — V^) 2 > 1 for any fixed constant e E [0,1/2]. 
Notice that (y/1 — e — y/e ) 2 = \{1 — 2e ) 2 + o((l — 2e) 2 ) when e -+ 1/2. For the censored block model 
with the background graph being random regular graph, it is further shown in [23] that the SDP 
relaxations also achieve the optimal exact recovery threshold. 

The above exact recovery threshold in the regime p = alogn/n shall be contrasted with the 
positively correlated recovery threshold in the sparse regime p = a/n for constant a. In this sparse 
regime, there exists at least a constant fraction of vertices with no neighbors and exactly recovering 
the clusters is hopeless; instead, the goal is to find an estimator a positively correlated with a* up 
to a global flip of signs. It was conjectured in [26] that the positively correlated recovery is possible 
if and only if a(l — 2e ) 2 > 1; the converse part is shown in [28] and recently it is proved in [38] that 
spectral algorithms achieve the sharp threshold in polynomial-time. 

5 An SDP for general cluster structure 

In this section we consider SDPs for the general case of multiple clusters and outliers. We assume 
there are r clusters with sizes K 1 ,..., K r , and n — ( K\ + • • • + I\ r ) outlier vertices. Vertices in 
the same cluster are connected with probability p, while other pairs of vertices are connected 
between them with probability q. We consider the asymptotic regime p = a — n , q = b —— and 
Kk = Pkn as n —>• 00 for a,b, po ,..., p r fixed, with p\ > ... > p r > 0. Let p m \n = p r - We derive 
sufficient conditions for exact recovery by SDPs. While the conditions are not the tightest possible 
for specific cases, we would like to identify an algorithm that recovers the cluster matrix exactly 
without knowing the details of the cluster structure. As in Section 3, the true cluster matrix can 
be expressed as Z* = Ylk=i £fc(£fc) T ’ w fi ere £& is the indicator function of the fc th cluster. Denote 
by Z n the collection of all such cluster matrices. 

Consider the SDP 


ZsDP = argmax (A,Z) (15) 

s.t. Z + 0 
Zu < 1 

Z-j j > 0 

(I, Z) = K x + • • • + K r 
(J, Z) = K 2 + • • • + K 2 . 

Implementing the SDP (15) requires no knowledge of the density parameters a and b, the number of 
clusters r, or the sizes of the individual clusters; but it does require the exact knowledge of the sum 
as well as the sum of squares of the cluster sizes, which, in practical applications, may be unrealistic 
to assume. Therefore, similar to (5), we also consider the following penalized SDP, obtained by 
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removing the constraints for those two quantities while augmenting the objective function: 

Z S dp = arg max (A, Z) - rj* (I, Z) - A* ( J, Z) (16) 

s.t. Z >z 0 
Zu < 1 

Zij > 0. 

Here the penalization parameters rj* and A* must be specified. 

Clearly the above two SDPs are different and need not have the same solutions; nevertheless, 
they are similar enough so that in the following theorem we state a sufficient condition for either 
of the SDPs to exactly recover Z* with high probability. Define 

I(n,d) = n-dlog^j. (17) 

For fi > 0 fixed, I(p,d) is a strictly convex, nonnegative function in d which is zero if and only if 
d = fjL. 

Theorem 8. Suppose there exists if\ > 0 and ^2 > 0 with b + if\ + ip 2 < a such that 

I(a,b + 'i/j 1 +^ 2 ) > l/Pr (18) 

I(b,b + iff) > l/p r (19) 

I(b, b + if 2 ) > l/pr-i (20) 

I(b, b + if x + if 2 ) > 1/Pr (21) 

(with the understanding that (19) and (20) can be dropped if there is only one cluster (i.e. r = 1 ) 
and (21) can be dropped unless there is only one cluster plus outlier vertices). Let rj* = C\/\ogn 
for a sufficiently large constant C and let X* = (kZSlZdbl log " _ jj ^sdp is produced by either SDP 
(15) or SDP (16), then min z*ez n HH-^sdp = Z*} > 1 — n _r2 W. 

We examine two simpler sufficient conditions for recovery, assuming we have enough information 
to implement one of the two SDPs, and we also have a lower bound p m ; n , on the pf s, but we don’t 
know how many clusters there are nor whether there are outlier vertices. The conditions of Theorem 
8 are most stringent when there are two clusters of the smallest possible size p m i n , and in that case 
we get the tightest result from the theorem by selecting = Vh yielding the following 

corollary: 

Corollary 1. Let if be the solution to I(a,b + 2 fj) = I(b, b + if). (It satisfies b < b + 2if < a.) If 
I(b,b + if)> l/pmin, then min z *ez n P{^sdp = Z*} > 1 - 

There is no simple expression for if in Corollary 1. If instead we consider the equation I (a, b + 
2 if) = I(b,b + 2 if), we have the smaller but explicit solution if = where r = lo ^~^ . Using 
this if in the test I(b,b + if) > l/pmin, we obtain the following weaker but more explicit recovery 
condition, which, nevertheless, is within a factor of eight of the necessary condition (see Remark 2 
below): 

Corollary 2. If I (, b , ^) > l/p m \ n then min z*&z n ^{ZsDP = Z*} > 1 (If SDP (16) 

is used, it is assumed that rf and X* are selected as in Theorem 8, namely, rf = Cy/ log n and 
A — n , where r — log ( a ) fe ) • ) 
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Remark 2. Let us compare the sufficient condition provided by Corollary 2 with necessary condi¬ 
tions for recovery. In the presence of outliers, 7(6, r) > l//9 m m is a necessary condition as shown in 
[24, Theorem 4], for otherwise we can swap a vertex in the smallest cluster with an outlier vertex 
to increase the number of in-cluster edges. Also, with at least two clusters, 

(y/a — Vb) 2 > 1 / p min (22) 

is necessary, because we could have two smallest clusters of sizes Pmi n n. and even if a genie were 
to reveal all the other clusters, we would still need (22) to recover the two smallest ones, as shown 
by [2, Theorem 1], By Lemma 11, 7(6,r) < (y/a — yfb') 2 < 27(6, t); so with or without outliers, 
27(6, 7") > 1/pmin is necessary. By Lemma 12, 7(6, r) < 47(6, £±^). Therefore we conclude that the 
sufficient condition of Corollary 2 is within a factor of four (resp. eight) of the necessary condition 
in the presence (resp. absence) of outliers. 

6 Conclusions 

This paper shows that the SDP procedure works for recovering community structure at the asymp¬ 
totically optimal threshold in various important settings beyond the case of two equal-sized clusters 
or that of a single cluster and outliers considered in [24]. In particular, SDP relaxations works 
asymptotically optimally for two unequal clusters (with or without knowing the cluster size), or r 
equal clusters, or the binary censored block model with the background graph being Erdos-Renyi. 
These results demonstrate the versatility of SDP relaxation as a simple, general purpose, compu¬ 
tationally feasible methodology for community detection. 

The picture is less impressive when these cases are combined to have a general case with r 
clusters of various sizes plus outliers. Still, we found that an SDP procedure can achieve exact 
recovery even without the knowledge of the cluster sizes; the sufficient condition for recovery is 
within a factor of eight of the necessary information-theoretic bound. An interesting open problem 
is whether the SDP relaxation can achieve the optimal recovery threshold in this general case. The 
preprint [37] addresses this problem, showing that the SDP relaxation still achieves the optimal 
threshold for recovering a fixed number of clusters with unequal sizes. 

7 Proofs 

7.1 Proofs for Section 2: Binary asymmetric SBM 

Lemma 1 ([24, Lemma 2]). Let X ~ Binom ^m , al ° s ”' ^ and R ~ Binom (jn, b ^ — 'j for m € N 
and a, 6 > 0, where m = pn + o(n) for some p > 0 as n —>• oo. Let k n ,k' n € [m] be such that 
k n = rplogn + o(logn) and k' n = t' plogn + o(logn) for some 0 < r < a and t' > 6. Then 

¥{X < k n } = n -p{*-^ogf+o{i)) ( 23 ) 

P {R> k' n } = n -p(b-T'lo S ^+o(l)) ( 24 ) 

Lemma 2. Suppose a, b > 0, a E M, and either p\ > 0 or p 2 > 0. Let X and R be independent with 
X ~ Binom(mi, a 1 ° g - ) and R Binom (m2, bl °f n ), where m± = p±n + o(n) and m 2 = P 2 n + o(n) 
as n —>• 00. Let k E N such that k = a log n + o(log n ). If a < ap\ — bp 2 , 

V{X-R<k} = n -»(pi.Pa.«A«)+°(i) > (25) 
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where 


g{pi,P2,a,b,a) = < 


api + bp 2 - 7 - f log %~ + % a bp 2 Pl ' p2 > 0 

P2& + a log pi = 0, p 2 > 0 j 

pia-alog 2 ^ pi>0,p 2 = 0 


with 7 = \J o! 2 + 4pip 2 a&. 

Furthermore, for any mi,m 2 , k £ N srtc/i t/iat A: < (mia — m 2 6) log n/n, 

P {X-R<k}< n - 9 ( mi /n,m 2 /n,a,b,k/\ogn)' (26) 

Proof. We first prove the upper tail bound in (26) using Chernoff’s bound. In particular, 

P {X — R < k} < exp (—n£(k/n )), (27) 

where i(x) = sup t>0 — tx — ^logE [e _t ( A_R l]. Let pi >n = m\/n, p 2)Tl = ru 2 /n and a n = k/logn. 
By definition, 


— logE 
n 




= Pl,n log 1 - 


a log n 


n 


b log n 


(!-e *)) + P 2 ,n log 1-(1-e 4 ) ■ 


n 


Since — tx — 4 logE [e px ^1] is concave in t, it achieves the supremum at t* such that 

ae~** log n/n be** log n/n 


—x -\- P\ n - - - 

A ’ l _ alogn ^ _ e -t*^ 

It suggests that when x = k/n, we choose 


1 - 


alogn 


(l-e-*-) 


= 0. 


l°g Pl ’ ni P2 ' n ^ ^ 


r = 


log Pl ’ n = °’ p2 ’ n > 0 ’ 

l !°g ^ Pl,n > 0, P2,n = 0 


with 7 n = i/a 2 + 4py n p 2jn a6. Thus, using the inequality that log(l — x) < —x, we have 

log n 


((k/n) > (~a n t* + pi, n a + p 2 , n b - pi, n ae ** - P 2 , n be t *^j 

log n 


n 


— 9(pi,n, P2,ni O-i b, OL n )- 


n 


Then in view of (27), 

P {X-R<k}< n ~9(Pl,n, P 2 . n , a, b, an). 
If Pi,n = Pi + o(l), P2,n = P 2 + o(l), and a n = a + o(l), then we let 

'log^ P1,P2> 0 
t* = < log =% Pi = 0, p 2 > 0 , 

. log^ pi>0,p 2 = 0 
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with 7 = ^/a 2 + Apip^ab. It follows that 


£(k/n) > (^—at* + p±a + p 2 b — piae f — p 2 be t ^ + o(\ogn/n) 

= 9(Pi,P 2, a, + o(log n/n). 

n 

and thus the upper tail bound in (25) holds in view of (27). Next, we prove the lower tail bound 
in (25). 

Case 1: pi,p 2 > 0. For any choice of the constant a! with a! > |a|, 


ol + a 


{X — R < a log n} D < X < —-— log n > n < R > ——— log n 


a — a 


and therefore 


Q! / :q; / >|q;| 

So, applying Lemma 1, we get that 


a' + a 


{X — R < a log n} > max P < X < -log n > P < R > -log n > . 


a — a 


P {X — R < a log n\ > max n 

a!\ot! >\cx\ 


- [api-a ^2 log ^+&P2-^ log ^+0(1)] 


Setting a' = \Jo? + 4p±p2ab in the last displayed equation yields 

P {X - R < a log n} > n~ [ apl+bp2 ~^-? log +o(1) ]. 

Case 2: p\ = 0, p 2 > 0. We have 

{X — R < alogn} D {X < 2miologn/n} n {R > —alogn + 2mialogn/n} 
and therefore 

P{X — R < alogn} > P{X < 2mialogn/n}P{ii > —alogn + 2rnialogn/n} 


1 —p2b+a log-^ 2 f+o(l) 


> —n 
“ 2 

where the last inequality follows because by Markov’s inequality, P{X < 2mialogn/n} > 1/2, and 
in view of Lemma 1 with m\ log n/n = o(logn), 

P{7? > —alogn + 2mialogn/n} < n -P2b-aiog 2 +o(i)^ 

Case 3: p\ > 0, p 2 = 0. We have 

{X — R < alogn} D {A" < a log n/n} 

and therefore 

P{X — R < alogn} > P{A < a log n/n} > n _Pia+alog_ ^ l_0 ( 1 ) ) 
where the last inequality follows from Lemma 1. □ 
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The following lemma provides a deterministic sufficient condition for the success of SDP (3) in 
the case a > b. 

Lemma 3. Suppose there exist D* = diagjd*} and A* € M such that S* = D* — A + A*J satisfies 
S* Y 0, A 2 (5*) > 0 and 

S'* < 7 * = 0. (28) 

Then Ysdp = ^ * unique solution to (3). 

Proof. The Lagrangian function is given by 

L(Y, S, D, A) = {A, Y ) + (S, Y) - (D, Y- I) - A «J, Y) - (2 K - nf) , 

where the Lagrangian multipliers are denoted by S Y 0, D = diag{dj}, and A € R. Then for any 
Y satisfying the constraints in (3), 

(a) 

{A,Y) < L(Y, S*,D*,X*) = (D*,I) + X*(2I< — n) 2 = (D*,Y*) + X*{2K — n) 2 
= (A + S*- X*J,Y*) + X*{2K -n) 2 = (A,Y*), 

where (a) holds because (S*,Y) > 0; (6) holds because (Y*,S*) = (a*) T S*a* = 0 by (28). Jdence, 
Y* is an optimal solution. It remains to establish its uniqueness. To this end, suppose Y is an 
optimal solution. Then, 

(S*,Y) = (D* — A + A* J, Y) = (D* — A + A*J, Y*)=(S* ,Y*) =0. 

where (a) holds because (J,T) = (J,T*), ( A,Y) = (A,Y*), and Yu = Y* i = 1 for all i £ [n]. In 
view of (28), since Y PJ), S* Y 0 with \ 2 (S*) > 0, Y must be a multiple of Y* = (t*(<t*) t . Because 
% = 1 for all i G [n], Y = Y*. □ 


Proof of Theorem 1. Let D* = diag{d*} with 

n 

d* = '£Ai j a*a* - X*(2K - n)a* (29) 

3 = 1 

and choose A* = rlogn/n, where r = log a-iogb ■ ^ su ffi ces t° show that S* = D* — ^TA^J satisfies 
the conditions in Lemma 3 with high probability. 

By definition, d*a* = Aija* — A*(2 K — n) for all i € [n], i.e., D*a* = Aa* — X*(2K — n) 1. 
Since J a* = (2 K — n) 1, it follows that the desired (28) holds, that is, S'*< 7 * = 0. It remains to 
verify that S* Y 0 and A 2 (S'*) > 0 with high probability, which amounts to showing that 


inf x T S*x > 0 !> > 1 — n n< ' 1 ' > . 
rr_Lcj*,||a;||2=l 


(30) 


Note that E [A] = tfiYlY* + — pi and Y* = cr*(cr*) T . Thus for any x such that x _L cr* and 

\\xh = 


x T S*x = x T D*x — x 1 E [A] x + A*x 1 Jcc — x 1 {A — E [A]) x 


„T 


■ 


= x 1 D*x - 1 Y*x + ( A* - ) x T Jx + p - x T {A - E [A]) x 


T P~q„T^ 

p + q\ t- 


( a ) T 7^,* , / \ * 

= x D x + A — 


x 1 Jx + p — x T {A — E [A ]) x. 


(31) 
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where (a) holds because (. x,a *) = 0. It follows from (31) that for any x _L a* and ||x ||2 = 1, 
x T S*x = ti(x) + t 2 (x) where 

t\(x) = x t D*x + ^A* — ^ ^ ^ x T Jx, (32) 

t 2 (x) = p — x T (A — K[A])x. (33) 

Observe that 

inf x T S*x > inf fi(x) + inf t 2 (x). (34) 

rr_Lcr*,||:c ||2 = l a:_Lcr*,||a ;||2 = l xA-cy* ,||cc|| 2 = l 


Now inf^xo-* i| x i| 2= i t 2 (x) > p — \\A — E[Al]||. In view of [24, Theorem 5], with high probability 
||-A—E [T] || < d\J\ogn for a positive constant d depending only on a and thus inf a .j_ t 7 *M a .|i 2=1 t^ix) > 
p — dyjlogn. 

We next bound inf x j_ cr *|| x ||, 2:= i t\(x) from the below. Consider the specific vector x that maxi¬ 
mizes x T Jx subject to the unit norm constraint and (x, cr*) = 0. It has coordinates \J n ~^ for the 
K vertices of the first cluster and coordinates . / , — for the n — K vertices of the other cluster. 

y n[n—K ) 

Let E 2 = span(<r*,x); is the set of vectors that are constant over each cluster. Then 

inf tAx) = inf t\ (fix + \/l — /3 2 x] . 

x±(7*,||x||2 = l /3£[0,1],||®||2 = 1,®-L-E : 2 V ' 

Notice that for any vector x with x _L E 2 , Jx = 0. It follows that 

t\ 3x + \/l — (3 2 x^J = (3 2 ti(x ) + 2j3 \/1 — (3 2 x T D*x + (1 — (3 2 )x T D*x. 

Therefore, 


X-Lc 


inf t\(x) > inf f / 3 2 ti(x ) + 2(3\/l — f3 2 inf x T D*x + (1 — /3 2 ) inf x T D*x 

* ,|l a: ll2 = l /3S[0,1] \ ||®||2 = 1,®-LE2 ||®||2 = l,X-Li?2 


We bound the three terms in the parenthesis separately in the sequel. 

Lower bound on fi(x): Notice that x T Jx = 4 K(n — K)/n and thus 

A* — ^ ^ x T Jx = (r — ° M 4 K(n — K) log n/n 2 


If a* = +1 then 


If a* = — 1 then 


Therefore, 


= (r — a + r — b) 2I\ (n — K) \ogn/n 2 


E [d*\ = d + = {K(a — r) + (n — K){r — b) — a} log n/n. 


E [d*] = d- = {(n — K)(a — r) + K(r — b) — a} log n/n. 


E 


x t D*x 


= (n — K)d + /n + Kd-/n 

= [2K{n — K){a — r) + ( I\ 2 + (n — K ) 2 ) (r — b) — na } log n/n 2 . 
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Since x T D*x = (. A , B) — \*{2K — to) ^)” =1 xfa*, where i3 I? - = (JiCTjxf, it follows that x T Dx is Lips- 
chitz continuous in A with Lipschitz constant ||-B||f = \/(l — p) 2 /p + p 2 /(1 — p) + o(l). Moreover, 
is [0, l]-valued. It follows from the Talagrand’s concentration inequality for Lipschitz convex 
functions (see, e.g., [40, Theorem 2.1.13]) that for any c > 0, there exists d > 0 only depending on 
p, such that 


IP 





> 1 — n c . 


Hence, with probability at least 1 — n c , 

P + q\ -T 


H(x) > E 


x T D*x 


+ [ A* — ^ (. 1 ) x 1 J5: — d -\J log n = (r — b) logn — alogn/n — d\/\ogn. (35) 


Lower bound on intiMi^i , X ±E 2 x T D*x: Note that E[D*]x € E 2 . So for any vector x with 
x _L E 2 , x T E[D*]x = 0. Hence, 


inf 

\\x\\2=1,xA-E2 


x t D*x = 


inf x T {D* - E [D*])x > -\\(D* - E [D*])x ||. 

\x\\2 = 1,xA-E2 


(36) 


Notice that 


||(£)- E [£)]) s ||1 = X;(Z)(A 

\i= l 


Therefore 


- E [AjMa*^ = - E [Aj])a* 


\j=i 


E [||(D - E{D])x\\] < \/E[||(D-E[n|)i||2] 




£4 


£ var [Aij\ < V a log to, 


3= 1 


One can check that ||(D — E[D])x|| is convex and Lipschitz continuous in A with Lipschitz con¬ 
stant bounded by max{y/^£, In particular, for any given A, A', let D,D' denote the 

corresponding diagonal matrix, then 


| ||(D — E [D])x|| — || (D 1 — E \D'~\ )x||J < || (D - D')x\\ = 

* JEW*, ~ W - {^ 

<M-A'|| F max[yiHZ,yiPl}, 

where the last inequality follows from the Cauchy-Schwartz inequality. It follows from the Tala¬ 
grand’s concentration inequality for Lipschitz convex functions that for any c > 0, there exists 
d > 0 such that 

F {\\(D - E[D])x\\ -E[\\{D - E[D])x\\] < cVlogn} > 1 ~n~ c . 

Hence, with probability at least 1 — n~ c , || {D* — E [_D*])x|| < d \/\ og n for some universal constant 
d and inf|M| 2 = 1 ) 3 .j _ E2 x T D*x > —dy/logn in view of (36). 
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Lower bound on inf|i 3 ,|| 2=1 tX ±E 2 x T D*x: Notice that for ||x ||2 = 1, x T D*x > min^d*, so it 
suffices to bound min.; d* from the below. For i £ C i, Aija^j is equal in distribution to X — A, 
where X ~ Binom(A" — 1, a 1 ° s — ) and R ~ Binom(n — AT, *° gn ). It follows from Lemma 2 that 

P I - 2K) logn/n + jAg-l < „-><P,",«+o(l>. 

For i E C 2 , AijUiUj is equal in distribution to X — A, where X r^j Binom(n — AT — 1, Q 1 ° g - ) and 
A ~ Binom(A, b ^— ). It follows from Lemma 2 that 


AijCTidj < rOn — 2A') logn/n + > < n P(p' a ’ b )+°A )_ 

“ log log n • 


It follows from the definition of d* that 


(d* > lQ f n ) > i _ n-’?(p.“. 6 )+°( 1 ) Vi 
[ log log n J 


Applying the union bound, we get that 


. ,* . log n 

mind; > ---- 

ie[n] log log n 


> 1 _ n l-r?(p,a,6)+o(l) > ^ _ n - f2 ( 1 ) 


where the last inequality follows from the assumption that r](p,a,b) > 1 . 

Combing all the three lower bounds together, we get that with high probability, 

inf ti(x) > inf /3 2 (t — b ) logn — 2c / /3-\/f — /3 2 \/logn + (1 — /3 ) 2 n -c'-v/log 

as_L<r*,||as||2=l /3e[0,l] log log TO 


n 


> - min < (t — 6 ) log n, 


logn 
log log n 


| - 3c / y / Iogn, 


Notice that we have shown that with high probability inf x ._i _ (T * j || a; ||, =1 t 2 (x) > p — c'^/log n. It follows 
from (34) that with high probability, 

1 | log Xlj | 

inf x T S*x > - min < (r — b) logn, -— -> — 4c' W logn + p. 

ac_Lo-*,||®ll 2 =l 2 ( log log nj 


Notice that a > b > 0 and thus r > 6. Therefore, the desired (30) holds and the theorem follows 
from Lemma 3. □ 

Proof of Theorem 2. Since the prior distribution of a* is uniform over {a E {±l} n : <r T l = 2K— n}, 
the ML estimator minimizes the error probability among all estimators and thus we only need 
to find when the ML estimator fails. Let denote the true cluster 1 and 2, respectively. 

Let e{i,T) = Aij for a set T. Recall that r = t 1 ■ Let F± denote the event that 

mniigc* (e(i, C*) — e(i, C 2 )) < — r(l — 2p) log n — 2 and F 2 denote the event that min; e c'* (e(i, C 2 ) — 
e(i, Ci)) < t( 1 — 2 p) log n — 2. Notice that F\ n F 2 implies the existence of i E Cf and j E C 2 , such 
that the set (C'*\{z} U {j}, C 2 \{j} U {*}) achieves a strictly higher likelihood than (C*, C 2 ). Hence 
P{ML fails} > P{Ai n A 2 }. Next we bound P{Ai} and P{A 2 } from below. 
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By symmetry, we can condition on C * being the first I\ vertices. Let T denote the set of first 
I , " I vertex. Then 

L log^ n J 

min(e(i, C*) — e(i, C 2 )) < min(e(i, C*) — e(i, C 2 )) < min(e(z, Cf\T) — e(i,C 2 )) + maxe(i,T). (37) 


ieC 


i&T 


i&T 


i£T 


Let E \, E '2 denote the event that maXj e re(z,T) < lo °^g n — 2, minj G y(e( 7 , C*\T) — e(i,C 2 )) < 

—r( 1 — 2p) log?i — f Q g n , respectively. In view of (37), we have F\ D E\ n £2 and hence it boils 
down to proving that ¥{Ei} —» 1 for i = 1, 2 . 

For i E T, e(i, T) ~ Binom(|T|, a log n/n) . In view of the following Chernoff bound for binomial 
distributions [31, Theorem 4.4]: For r > 1 and X ~ Binom(n,p), P{X > rnp} < ( e/r) rnp , we have 


P 


. log n ) / log 2 n \ 

e(*, r) > -— -2 f < — t —t— 

log log n J \ae log log n J 


log n / log log n+2 

= n -2+oW 


Applying the union bound yields 

P{£i}> 1-VP (e(i,T) > - 2 ) > l-n- 1 + °W. 

jrt l log log n J 


Moreover, 


P{£ 2 } = 1- 


(b) 

> 1 - 


f] P | e(i, C*\T) - e(i, C* 2 ) > -r( 1 - 2p) log n - 1 () !^p ) !! n } 
(l - n ~ vM)+oW ) lTl > 1 - exp f- n 1 ~ ? ?(pA fe )+ 0 ( 1 ) N ) ^ 1> 


where (a) holds because {e(i, C , *\T)}jg'r are mutually independent; ( b ) follows by applying Lemma 2 
and noticing that g(p, 1 — p,a, b , —r(l — 2 p)) = p(p, a , 6 ); (c) is due to 1 + x < e x for all x € M; (d) 
follows from the assumption that rj(p,a,b ) < 1. Thus P{Fi} —» 1. Using the same argument as 
above, we can show PjF^} —>• 1. Thus the theorem follows. □ 


Proof of Theorem 3. Suppose that the true clusters are Cf and C 2 of cardinality K n and n — K n , 
respectively. One can easily check that Lemma 3 still holds with A* = rlogn/n, where r = 
io° a—lo 0 b • Choose the same d* in (29) as in the proof of Theorem 1. It suffices to show for any 
0 < K n < n, 

pj inf x T S*x > oj > l-n" n(1) . (38) 

First, consider the case I\ n = 0 or n where Y* = J and the graph is simply Q(n,p). Then for 
i € [n], Y2j Aijcr*a* ~ Binom(n — 1, aI ° gn ). Recall that r = lou . ^Z \ og g and notice that in this case, 
d* = Aijcr*a*j — rlogn. It follows from Lemma 1 that 


d * < log n 
1 ~ log log n 


E 


Aij(j*a* < rlogn + 


logn 
log log n 


< n _7 ?( 0 ’ a ’ 6 )+o(l) < n -n{l/2,a,b)+o{l) ^ 


where 7 /( 0 , a, 6 ) = a — rlog(ea/r) and the last inequality follows from Lemma 13 in Appendix A. 
By the union bound, 


P < min d* > 

l*e[n] 


log n 
log log n 


> ,a,b) > l_„-n(l) ) 
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where the last inequality holds because 77 ( 1 / 2 , 0 , 6 ) = ^(y/a — y/b) 2 > 1 by assumption. Moreover, 
since a* = ±1, any x such that x _L a* satisfies x T Jx = 0. It follows from (31) that 

x T S*x = x t D*x + p — x T (A — E [A])x > mind* + p — ||A — E [v4] ||. 

iS[n] 

By [24, Theorem 5], P[||A — E [A] || < c'y/ log n] > 1 — o(l) for a positive constant d depending only 
on a and thus the desired (38) follows. 

Next, we consider the case 1 < K n < n— 1. For j £ Cj, Ylj AijOiOj is stochastically larger than 
X — R — 1, where X ~ Binom(/i n , -*° gn ) and R ~ Binom(n — K n , bl ° i f l ). Let p n = -p G (0,1) and 
t n = r( 1 — 2 p n ) logn — lo ^pg n — 1. Applying the non-asymptotic upper bound in Lemma 2 yields 


y t AijUiOj < -A *(n - 2 K n ) + 


logn 
log log n 


< P {X — R < — t n } < n 


-g(Pn,l-pn,a,b,-T^) 


We proceed to show that g(p n , 1 — Pn, a, b , — 75^) > 77(1/2, a, 6) + o(l). First note that 

dg 1 ap 1 (^Aabp 1 p 2 + t 2 -1) 

—{pi,P2,a,b,t) = --log-— ==-- 

^ 2 bp2{\/ Aabp\p2 + t 2 + t) 

and = r <X “ Pn) - e n, where e n = lo J ogn + ^ and p n = 1 - p n . Furthermore, for any 

fixed a, b > 0, 

da 

-^(p,P,a,b,T(p-p) - 8) <F(a,b), ( 39 ) 


sup sup 
0<p<10<(5<e r , 


for some function F(a, b) independent of n and p = 1 — p. To see this, let t = r{p — p) — 8, where 
0 < 8 < e n . First consider the case of t < 0. Then p <\ + o(l) < 2/3. Hence 1/3 < p < 1. Then 


p{y/ Aabpp + t 2 - t) _ (y/ ab + (A 2 — ab)(p - p) 2 + S 2 + 28\{p - p) + (p — p )A + 8)' 
p{y/ Aabpp + r 2 + r) 


4 abp 2 


(40) 


Since yfab < r < whenever a / 6 and p-p£ [— 1 , 1 ], both the numerator and denominator in 
(40) are bounded away from zero and infinity uniformly in p. The case of t > 0 follows analogously. 
Therefore 

g[p n , 1 - p n ,a,b,-j^—^ > g[p n , 1 - p n , a, b, —r(l - 2p n ) S j - F(a,b)e n (41) 


= 77 (^p n ,a,b^j - F(a,b)e r 

> V (77, M) - F ( a , b ) e n 


(42) 

(43) 


where (41) is due to (39), (42) is by definition of 77 , and (43) follows from Lemma 13. 

Similarly, for i G C 2 , AijU^aj is stochastically larger than X — R — 1, where X ~ Binom(n — 
K n , ^p) and i? ~ Binom(AT n , pP). Let A:/ = r(l - 2/plogn + + 1. It follows from 

Lemma 2 that 


y AijGiGj < A*(n — 2/L n ) log n/n + 


logn 
log log n 


< p { y AijGiGj < k' n 


< n -ff(l-pn,Pn,a, 6 ,fc^/logn) < -77(1/2,a, 6 )+o(l) 
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where the last inequality follows from the same steps as in (41) - (43). It follows from the definition 
of d* that 


min P / d* > > 1 _ n ~v(i/2,a,b)+o(l) 

ie{n] { log log nj 


Applying the union bound gives 


P < min d* > 
l<e[n] 


log n 
log log n 


> l _ n l-^(l/2,a,fe)+o(l) ■> j _ n - n ( 1 ) ; 


(44) 


where the last inequality follows from the assumption that 77(1/2, a, b ) = ^(y/a — Vb) 2 > 1. Further¬ 
more one can verify that inf x j_ a * £2(2) > p — 0(y/\ogn) with high probability, where the functions 
f 1 and t ,2 are defined in (32)-(33). We divide the remaining analysis into the two cases: 

Case 1: K n < n/y/\ogn or n — K n < n/yjlogn. Notice that r < ^4^ and recall the definition 
of x in the proof of Theorem 1. Then 

~ = 2(a + b — 2 r)K n {n — K n ) log n/n 2 < 2(a + b — 2r)^/^ogn. 


It follows that with high probability, 

inf t±(x) > mind* 
aiJ_cr*,||ai||2=l ® 


f P + Q 

V 2 



x T Jx > 


logn 
log log n 


0(y/logn). 


Thus the desired (38) follows by the same argument used in the proof of Theorem 1. 

Case 2: K n > n/y/ log n and n — K n > n/y/ log n. In this case, p n > l/y/\ogn. The proof is 
exactly the same as in the proof of Theorem 1 except that x T Dx and \\(D — E [Z)])x|| are Lipschitz 
continuous in A with Lipschitz constants bounded by 0(log 1 ^ 4 n). Thus for any constant c, by 
Talagrand’s inequality, there exists a constant d > 0 such that 

P | x T D*x — E [x T D*x] > —c log 3//4 n| > 1 — n~ c . 

F {\\(D - E[D])x\\ -E[\\{D-E[D])x\\] < c'log 3/4 n} > 1 - n~ c . 

Then the desired (38) follows by the same argument used in the proof of Theorem 1. 

□ 


7.2 Proofs for Section 3: Multiple equal-sized clusters 

Theorem 4 is proved after three lemmas are given. For k € [r], denote by Ck C [n] the support of 
the k th cluster. For a set T of vertices, let e(i,T) = and e(T',T) = YlieT 1 e (4T). Let 

k[i) denote the index of the cluster containing vertex i. Denote the number of neighbors of i in 
its own cluster by s t = e{i,Cuj^) and the maximum number of neighbors of i in other clusters by 
n = max fcVfc(i) e(i,C k >). 

Lemma 4. 


min(sj — n) < logn/log logn 

iefnl 




20 















Proof. Notice that s* ~ BinornfA', p) and for k! k(i), e(i,Ck>) ~ BinornfA, q). It is shown in [2] 
that 

P{si — e(i,Ck') < logn/loglogn} < n~^~^ 2 / r+0< JJ. 

It follows from the union bound that 

P{sj — ri < log n/ log log n} < rn~^°'~ y ^ / r '+°( 1 ). 

Applying the union bound over all possible vertices, we complete the proof. □ 

Lemma 5. There exists a constant c > 0 depending only on b and r such that 

P < max — ri < Kq + c-J log?r > > 1 — rn~ 2 . 

r 


Proof. We hrst show E [rf\ < Kq + Ofydogn). Let to = y/^ogn. Then 


E [ri] = E 


max eft, CV) 
k'£k(i) 


max eft, Ci./) > t > dt 
k'^k(i) V J ' 


dt 


-fp{ 

Jo [/ 

POO 

< / (rP{eft, Sk>) > t} A 1) i 
Jo 

POO 

< Kq + to + r / P {eft, S*/) — Kq > t] dt 

Jt 0 

(a) roc ( t 2 \ 

- Kq + t0 + r l eXf {- 2K q+ 2t/3 Ut 

where (a) follows from Bernstein’s inequality. Furthermore, 


rOi 

Jt Q 


exp 


2 Kq + 2t/3 


dt < r 


< 


_ r oo 

-3* 2 /(8 Kq) dt+ / e -3t/S dt 

J Kq 

r e -3t 0 t/ { SKq )dt + & e -3Kq/8 

Jto 3 


'to 

rKq 


< ^ e - 34 o/( 8A ' ? ) + -e ~ 3Kq / 8 = O(v / iogn). 

3to 3 

Thus E [ri] < Kq + 0(y/logn). Denote r * ~ € Ck,j Ck)- Then g satisfies the 

bounded difference inequality, i.e., for all t = 1,2,..., m with m = (r — 1 )K 2 , 

sup \g(xi,... ,Xi-i,Xi,x i+ i,.. .,x m ) - g(x i,... ,Xj_i,x',x i+1 j ■ ■ ■ ; ®m) | E 1- 

Xl,...,X m ,X , i 


It follows from McDiarmid’s inequality that 


5>-£ E [ri] > K yjr log n > < exp 
i£Ck 2EC7- 


2K 2 r log n 
~Kh- 


= n 


-2 


Thus, with probability at most n 2 , YlieCk r * — -^9 + O(yflogn). The lemma follows in view of 
the union bound. □ 
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The following lemma provides a deterministic sufficient condition for the success of SDP (9) in 
the case a > b. 

Lemma 6. Suppose there exist D* = diag{d*} with d* > 0 for all i, B* € S n with B* > 0 and B\j > 
0 whenever i and j are in distinct clusters, and X* € M n such that S* = D* — B* — A+A*1 T + 1(A*) T 
satisfies S* y 0, and 


S*Ck = 0 ke[r] 
BfjZfj = 0 i,je[n 


(45) 

(46) 


Then Z$dp = Z* is the unique solution to (9). 

Proof. Let H = Z — Z*, where Z is an arbitrary feasible matrix for the SDP (9). Since Z and Z* are 


both feasible, (D*,H) = (A*l T ,H) = (1(A *) T ,H) = 0. Since A = D* - B* - S* + A*1 T + 1(A*) T , 


(A,H) = —(B*,H) — (S*,H) 


and the following hold: 

• (B*,H) > 0, with equality if and only if (B *, Z) = 0. That is because B* > 0, Z > 0, and 
(B*,Z*) = 0. 

• ( S*,H ) > 0, with equality if and only if (S*, Z) = 0. That is because (S *, Z) > 0 (because 
S*,Z y 0) and (S*, Z*) = 0 (because Z* = Ylk =l £fc(£fc) T an< ^ = ^ ^ or ^ e H)- 

Thus, (^4, H) < 0, so that Z* is a solution to the SDP. 

To prove that Z* is the unique solution, restrict attention to the case that Z is another solution 
to the SDP. We need to show Z = Z*. Since both Z and Z* are solutions, ( A,H) = 0, so that 
(B*,H) = (S*,H) = 0. Therefore, by the above two points: (B* , Z) = (S*, Z) = 0. For each i, 
B*j = 0 if and only if vertices i and j are in the same cluster. Also, the fact Z y 0 and Za < 1 
for all i implies Zij < 1 for all i,j. Thus, the only way Z can meet the constraint Z\ = K1 is that 
Z{j = 1 whenever i and j are in the same cluster. Therefore Z = Z* and hence Z* is the unique 
solution. □ 

Proof of Theorem 4 ■ We now begin the proof of Theorem 4. Let E denote the subspace spanned 
by vectors {^}fc S [ r ], he., E = span(££ : k £ [?’]). Ultimately, we will show that 



(47) 


Note that E [A] = (p — q)Z* + qJ — pi and Z* = YlkeM Thus for any x such that il E 

and 113? 11 2 = 1, 

x T S*x = x T D*x — x t E [A] x — x T B*x + 2x T A*l T x — x T (A — E [A]) x 


( = x r D*x — (p — q)x T Z*x — qx T Jx + p — x T B*x — x 1 (A — E [A]) x 

= x T D*x + p— x T B*x — x T (A — E [A]) x 
> x t D*x +p- x t B*x — ||A — E[A] || 


(48) 
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where (o) holds because x _L 1; ( b ) holds due to (x, ££) = 0 for all k E [r] and x 1 1. In view of 
[24, Theorem 5], ||j4 — E [A] || < c'yfiog n with high probability for a positive constant d depending 
only on a. To bounds (48) from below, it is convenient to choose B* such that 

x t B*x = 0, Vx ± E. (49) 


Since B* is assumed to be symmetric, (49) is equivalent to requiring that 


^C k xC k ,(^j) ~ Bc k ,xC k tid) — Ukk'^S) + z kk'{j)i VI <k<k'<r (50) 

for some y* kk , and z kk ,. Next we ensure that S*£ k = 0 for k E [r]. Equivalently, we want to ensure 
that for any distinct k, k' E [r] and any i E C k , 


d* = e(i,C k )-X*K-Y^K, 

i£.Ck 

e(i,C k ,)+ Y, (B* j ~X*) = KX*. 

3tC k , 


(51) 

(52) 


Requiring (52) for all distinct k , k' E [r] and all i E C k is equivalent to requiring 

e (j, C k ) + Yj (Bji - A*) = EX* (53) 

iec k 


for all distinct k , k' E [r] and all j E C k > (by swapping i for j and k for k'). Moreover, it is equivalent 
to checking both (52) and (53) under the additional assumption that k < k'. Substituting (50) into 
(52) and (53) gives that for all k, k' E [r] with k < k ', 


e(i,C k i) + I\yl k ,(i ) + Y / ( z kk'ti) ~ X*) — EX* for i E C k 

(54) 

e(j, C k ) + Y " A *) + K 4k>ti) = KX* for j E C k , 

i&C k 

(55) 

For k < k' and i E C k ,j E C k i , set 


ytk'i *) = ^ (n - e(i , C k >)) + u kk f 

(56) 

4k'U) = J 7 {fj - e(j, C k )) + u kk f 

(57) 

and for i E C k , 


X* = j7(ri - a k ). 

(58) 


where u kk ’ and a k are to be determined. Equations (54) and (55) both reduce to: 


at k + a k > = — e(C k , C k >) - 2Ku kk >, 

K 

(which must hold whenever k < k’) and (51) becomes 

d* = e(z, C k ) - n + 2a k - Y 
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In view of Lemma 4 and the assumption that yfa — y/b > y/r, min,(sj — rf) > log n/ log log n 
with high probability. By Lemma 5, max fce [ r j -/ Yliec k r * — ^q + 0(y/ log n) with high probability. 
Finally set 

1 ( T s /] -A 1 f e (Ck,C k >) r -1 

= 2 \ Kq ~ V l °g n ) , u cc' = - K< i + Vlogn . 

It follows from the definition that 

IP |min d* > log n/ log log n — 0{y/\ogn) j- —> 1. (59) 

Thus, the desired (47) holds in view of (59) and (48). Also, note that e(C k ,C k i) ~ Binom(IL 2 , q). 
For X ~ Binom(n,po) with po € [0,1], Chernoff’s bound yields 

P{A< (l-e)np 0 } <e~ e2npo/2 . 


It follows that 

P [e{C k ,C w ) < K 2 q - Ky/logn/2^ < n ~ 1 ^. 

Applying the union bound, we have that with high probability, e(C k , C k >) > K 2 q — Ky/logn for all 
1 < k < k' < r. Hence y kk i(i ) > 0 and z^ k ,(j) > 0 for all 1 < k < k! < r and i € C k and j € C k > so 

that B*j > 0 for all i. j in distinct clusters as desired. □ 

Proof of Theorem 5. A necessary condition for exact recovery follows from the condition for two 

clusters. If a genie were to reveal the membership of all clusters except for clusters 1 and 2, 

then the exact recovery problem would be equivalent to recovering two equal sized clusters in a 

network with n' = 2 K = vertices. And p = - r + °^ lop " , Similarly, q = ( r + °^ Ioglt . Thus, 

if \J ^ — yjf < y/2, or equivalently if y/a — y/b < y/r, the converse recovery result of [2] implies 

recovery is not possible for r equal size clusters. That is, if a > b and yfa — y/b < y/r, then for any 
sequence of estimators Z n , P{Z n = Z*} —> 0. □ 

7.3 Proofs for Section 4: Binary censored block model 

Our analysis of the SDP relies on two key ingredients: the spectrum of labeled Erdos-Renyi random 
graph and the tail bounds for the binomial distributions, which we first present. 

Recall that A is a symmetric and zero-diagonal random matrix, where the entries {A.y : i < j} 
are independent and A tJ ~ p{l — e)e)+i + pe5-\ + (1 — p)5o if i,j are in the same cluster; otherwise 
Aij ~ p( 1 - e)(5_i + pe8 + i + (1 - p)S 0 . Then E [A tj ] = p( 1 - 2e)a*a*. Assume p > for 

any constant Co > 0. We aim to show that ||A — E [A]|| 2 < dy/np with high probability for some 
constant c' > 0. 

Theorem 9. For any c > 0, there exists d > 0 such that for any n > 1, P {||A — E [A] || 2 < 
d y/np} > 1 — n~ c . 

Proof. Let E = (E t j) denote an n x n matrix with independent entries drawn from fi = + 

|(5_i + (1 — p)5o, which is the distribution of a Rademacher random variable multiplied with an 
independent Bernoulli with bias p. Define E' as E' u = Ea and E[- = — Eji for all i / j. Let 
A' be an independent copy of A. Let D be a zero-diagonal symmetric matrix whose entries are 
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drawn from p and D' be an independent copy of D. Let M = (Mjj) denote an n x n zero-diagonal 
symmetric matrix whose entries are Rademacher and independent from C and C'. We apply the 
usual symmetrization arguments: 


E[||A-E[A]||] 


E[||A-E[A']||] < E[||A — A'||] = E[||(A — A!) o M\\\ < 2E[\\AoM\\] 

2E[||D||] = 2E[||D - E[£>']||] < 2E[||D - D'\\\ = 2E [||E - E '||] ( < 4E[||£||], (60) 


where (a), (d) follow from the Jensen’s inequality; (6) follows because A — A' has the same distri¬ 
bution as {A — A 1 ) o M, where o denotes the element-wise product; (c), (/) follow from the triangle 
inequality; (e) follows from the fact that D — D' has the same distribution as E — E'. In particular, 
first, the diagonal entries of D — D' and E — E' are all equal to zero. Second, both D — D' and 
E — E' are symmetric matrices with independent upper triangular entries. Third, Dij — D ■ is equal 
in distribution to E tJ — E[- for all i < j by definition. 

Then, we apply the result of Seginer [39] which characterized the expected spectral norm of 
i.i.d. random matrices within universal constant factors. Let Xj = X^=i ^%i which are independent 
Binom(n,p). Since Jl is symmetric, [39, Theorem 1.1] and Jensen’s inequality yield 


E[||£||] < kE 

(maxXj'j 

se 

VI 

max Xj 




ie[n] 


1/2 


(61) 


for some universal constant k. In view of the following Chernoff bound for the binomial distribution 
[31, Theorem 4.4]: 


P{Xi > tlogn} < 2 *, 


for all t > 6np , setting to = 6 max{np/ log n, 1} and applying the union bound, we have 


E 


max X 

je[n] 


< 


/•OO ( \ /*oo 

/ P < max Xj > t > dt < / (nF {X\ > t} A l)dt 

Jo UeM J Jo 

PCX) 

to log n + n / 2 ~*dt < (to + 1) log n < 6(1 + 2 /co)np, 

J to loe n 


(62) 


where the last inequality follows from np > cologn. Assembling (60) - (62), we obtain 

E[||A — E[A] ||] < C2y/np, (63) 

for some positive constant C 2 depending only on co,ci. Since the entries of A — E[A] are valued in 
[—1,1], Talagrand’s concentration inequality for 1-Lipschitz convex functions yields 


P {]] A - E[A] || > E[|| A - E[A] ||] + t} < c 3 exp(-c 4 t 2 ) 


for some absolute constants C 3 ,c 4 , which implies that for any c > 0, there exists c' > 0 depending 
on Co, ci, such that P {11A — E[A]11 > dy/np] < n~ c . □ 

Let Xi, X 2 ,..., X m 1 '^'p( 1 — e)d_|_i +pe5-\ + (1 — p)5o for m E N, p E [0,1] and a fixed constant 
e € [0,1/2], where m = n + o(n) and p = alogn/n for some a > 0 as n —» oo. The following upper 
tail bound for X/;=i follows from the Chernoff bound. 
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Lemma 7. Assume that k n € N and k n = (1 + o(l)) log °f 0 g n • Then 


J2 X i < kn \ < n - a ( v/T ^- v/i ) 2+ ° (1) . 


( 64 ) 


. i =1 


Proof. If e = 0, then x i ~ Binorn(m, p) and the lemma follows from Lemma 1. Next we focus 
on the case e > 0. It follows from the Chernoff bound that 


^2 x i < k n > < exp (-m£(k n /m)), 


(65) 


. 1=1 


where £(x) = sup A>0 — Ax — log E [e AXl ]. Since X\ ~ p{ 1 — e)<5+i + pe6 -1 + (1 — p)So, 


E 


—AXi 


= 1 +p 


e _A (l - e) + e A e - 1 


Notice that —Xx — logE [e AAl ] is concave in A, so it achieves the supremum at A* such that 

p(e~ x * (1 — e) — e A *e) 


—x + 


1 +p[e A * (1 — e) + e A *e — 1] 


= 0. 


Hence, by setting x = k n /m , we get A* = 4 log —+ o(l) and thus 

£{k n /m) = — X*k n /m — log ^1 +p e _A *(1 — e) + e A *e — 1 j 

= log -—- — - log (1 - p(V 1 - e - + o(k n /m ) 

2 e m 

= o(V 1 — e — Vi)“ log n/n + o(log n/n), 


where the last equality holds due to the Taylor expansion of log(l — x) at x = 0 and p = alogn/n. 
Combining the last displayed equation with (65) gives the desired (64). □ 

The following lemma establishes a lower tail bound for j Xj. 

Lemma 8. Let k n be defined in Lemma 7. Then 


E 

. i=1 


Xi ^ hr} 


> »(\/i-e-\/i) +o(l) 


Proof. Let /c* = [2a-\/e(l — e) lognj. Notice that YllLi x f ~ Binom(m,p). Let Zi, Z 2 ,..., Z^^ll— 
e)<5_(_i + ed_i. Then 


. 1=1 


. 2—1 


X* < -Ay j > P X* < -Ay ^ X? = k* | P <| £ X? = /C 

r fc* 'i r m 

( = } pi ^Tz, < -fc n j>P<j^X 2 = F 


( 66 ) 


. 2—1 


. 2=1 


where (a) holds because conditioning on YHiLi X i = an d have the same 

distribution. Next we lower bound P |Xa=i X* < — Ay j and P { YIhL 1 X i = k*} separately. 
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We use the following non-asymptotic bound on the binomial tail probability [9, Lemma 4.7.2]: 
For U Binorn (n,p), 

(8fc(l — A))" 1 / 2 exp(— nD(X\\p)) < P{f7 > A’} < exp(— nD(X\\p)), 

where A = ^ € (0,1) > p and D(A||p) = A log ^ + (1 — A) log is the binary divergence function. 
Let W ~ Binom(fc*, e). Then, 

( k* ^ 


Zt < -k n \=fIw> 


. i=l 


k* + k n 


> 


exp 


-k*D ( - + 

' 2 2k* 


2(k* + k n )(k* - k n ) 

= exp [-k*D( l/2||e) + o(logn)], 

Moreover, using the following bound on binomial coefficients [9, Lemma 4.7.1]: 

© 


7r 


< 


(27rnA(l — A)) -1 / 2 exp(n/i(A)) 


< 1 . 


where A = ^ € (0,1) and h( A) = —A log A — (1 — A)log(l — A) is the binary entropy function, we 
have that 




1 


= exp —a log n + k* log 


2 v / 2/t*(l - k*/n) 
ea log n 


exp {—mD(k*/n 


k* 


+ o(log n) 


Observe that by the definition of k*, log a - = D(l/2||e) + o(l) and it follows from (66) that 

f rn \ r _ i 

P < ^ Xi < — k n > > exp —a log n + 2 ay/ e(l — e) log n + o(log n) 


. i =1 


= a (vT—7—v4) 2 +o(l) ^ 


□ 

The following lemma provides a deterministic sufficient condition for the success of SDP (13) 
in the case of a > b. 

Lemma 9. Suppose there exist D* = diag{d*} such that S* = D* — A satisfies S* P 0, A 2 (S*) > 0 
and 

S*a* = 0. (67) 

Then Ysdp = Y* is the unique solution to (13). 

Proof. The Lagrangian function is given by 

L(Y, S , D) = (A, Y) + (S, Y) - (D, Y- I), 

where the Lagrangian multipliers are S Y 0 and D = diag{dj}. Then for any Y satisfying the 
constraints in (13), 


(a) 


(b) 


(A,Y) < L(Y, S*,D*) = (D*, I) = (D*,Y*) = (A + S*,Y*) = ( A,Y*) 
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where (a) holds because (S*,Y) > 0; ( b ) holds because ( Y*,S*) = (a*) T S*a* = 0 by (67). Jdence, 
Y* is an optimal solution. It remains to establish its uniqueness. To this end, suppose Y is an 
optimal solution. Then, 

(S*,Y) = (D* — A, Y) = (D* — A, Y*)=(S*,Y*) = 0. 

where (o) holds because ( A,Y) = (A,Y *} and Yu = Y? = 1 for all i € [n]. In view of (67), since 
Y y 0, S* >: 0 with X 2 (S*) > 0, Y must be a multiple of Y* = a*(a*) T . Because Y tl = 1 for all 
ie[n},Y = Y*. □ 

Proof of Theorem 6. Let D* = diag{d*} with 


d* = E A ^>r 

3 = 1 


( 68 ) 


It suffices to show that S* = D* — A satisfies the conditions in Lemma 9 with high probability. 

By definition, d*a* = JT Aija* for all i, i.e., D*a* = Aa*. Thus (67) holds, that is, S*a* = 0. 
It remains to verify that S* Y 0 and \ 2 (S*) > 0 with probability converging to one, which amounts 
to showing that 

inf x T S*x > 0 

, fC_Lcr*,||fc||2 = l 


l. 


(69) 


Note that E [,4] = (1 — 2 e)p(Y* — I) and Y* = cr*((j*) T . Thus for any x such that x ± a* and 

IMh = 1, 


x T S*x = x 1 D*x — x 1 E [^4] x — x 1 (A — E [^4]) x 


„T; 


„T 


= x D*x — (1 — 2 e)p x Y*x + (1 — 2 e)p — x (A — E [A|) x 

= x t D*x + (1 - 2 e)p - x T {A - E [A]) x > min d* + (1 - 2 e)p - \\A - E [A] ||. (70) 

i£[n] 

where (a) holds since (x,a*) = 0. It follows from Theorem 9 that ||A — E [A] || < dy/\og n with high 
probability for a positive constant d depending only on a. Moreover, note that each di is equal 
in distribution to Y^i=i Xi , where are identically and independently distributed according to 
p( 1 — e)d + i +pe5-\ + 1 — p5o. Hence, Lemma 7 implies that 


' n— 1 


. i= 1 


; > 


log n 
log log n 


> \ _ n ~ e ^v7) 2 +°(i) 


Applying the union bound implies that min ie [ n ] d* > - holds with probability at least 1 — 

n i-a(Vi-e-v / e)“+o(i) _ follows from the assumption a(y/l — e — y/e) 2 > 1 and (70) that the desired 
(69) holds, completing the proof. □ 

Proof of Theorem 7. The prior distribution of a* is uniform over {±l} n . First consider the case of 
e = 0. If a < 1, then the number of isolated vertices tends to infinity in probability [17]. Notice 
that for isolated vertices i, vertex a* is equally likely to be +1 or —1 conditional on the graph. 
Hence, the probability of exact recovery converges to 0. 

Next we consider e > 0. Since the prior distribution of a* is uniform, the ML estimator 
minimizes the error probability among all estimators and thus we only need to find when the ML 
estimator fails. Let e(i,T) = I-Ay I, denoting the number of edges between vertex i and 
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vertices in set T C [n]. Let Sj = Y) ■ . * * A,,- and rj = ■ * , * A^. Let F denote the event that 

min,; e r n i (s* — rt ) < —1. Notice that F implies the existence of i € [n] such that a' with o', = —a* 
and (jj = c* for j f i achieves a strictly higher likelihood than a*. Hence P{ML fails} > P{F}. 
Next we bound P {F} from the below. 

Let T denote the set of first [ " j vertices and T c = [n]\T. Let s[ = ^ , = . Aij and 

log n • j u i 

r 'i = )ljfn A ij- Then 

min(sj — n) < min(sj — rf) < min(sj — r[) + max e{i, T). (71) 

ie[n] iST i&T i&T 


Let Ei,E 2 denote the event that max ieT e(i, T) < - 1, min ieT (s' - r') < - l0 gf 0 g n , respec¬ 

tively. In view of (71), we have F D E\ n E 2 and hence it boils down to proving that P{Fj} — >• 1 
for i = 1,2. 

Notice that e(i, T) ~ Binom(|T|, alogn/n). In view of the following Chernoff bound for binomial 
distributions [31, Theorem 4.4]: For r > 1 and X ~ Binom(n,p), P{X > rnp} < ( e/r) rnp , we have 


P 


e(b r ) > 


logn 
log log n 



< 


log 2 n \ 
ae log log n J 


log n / log log n+1 

= n - 2 +°w. 


Applying the union bound yields 


P{ Ex}> l-^pie(?:,T) > 


Moreover, 


■{E 2 } i =’i-n 

ieT 
( b ) 


ieT 


, / ^ logn 

s i~ r i> - 


log n 
log log n 


- 1 l > 1 - n -1+o(1) . 


log log n 

|T| (c) 


> 1 * (l - ' > 1 - exp <5 !_ 


where (o) holds because — r'}jare mutually independent; (6) follows from Lemma 8; (c) is 
due to 1 + x < e x for all i£l; (d) follows from the assumption that a(Vl — e — V^) 2 < 1- Thus 
P{F} —>• 1 and the theorem follows. □ 


7.4 Proofs for Section 5: General cluster structure 

We first present a dual certificate lemma which is useful for the proof of Theorem 8. Recall that 
denotes the indicator vector of cluster k for k E [r] and Z* = 

Lemma 10. Suppose there exist D* = diagjd*} with d* > 0 for inker vertices i and d* = 0 for 
outlier vertices i, B* € S n with B* > 0 and B*- > 0 whenever i and j belong to distinct clusters, 
rj* £ M, and A* E R such that S* = D* — B* — A + rf I + A* J satisfies S* P 0, X r+ i(S*) > 0 (where 
A r+ i (S*) is the (r + l) th smallest eigenvalue of S*), and 

S*Ck = 0 fc€[r], (72) 

BfjZfj = 0 i,j£[n]. (73) 

(If the penalized SDP (16) is used, the same rj* and \* should be used in the SDP and in this 
lemma.) Then Z * is the unique solution to both SDP (15) and (16) (i.e., Zsdp produced by either 
SDP is equal to Z*). 
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Proof. Let H = Z — Z*, where Z is either an arbitrary feasible matrix for the SDP (15) or an 
arbitrary feasible matrix for the SDP (16). Since A — rfl — A*J = D* — B* — S *, 

{A, H) — 77 * < 1 , H) - A*(J, H) = (D*, H) - (B*,H) - ( S*,H) 


and the following hold: 

• ( D*,H) < 0, with equality if and only if Za = 1 for all inlier s i. That is because for inliers 
i, d* > 0, Za < 1 = Z *, and for outliers i, d* = 0. 

• ( B*,H) > 0, with equality if and only if (B*,Z) = 0. That is because B* > 0, Z > 0, and 
(B*, Z*) = 0. 

• (S*,H) > 0, with equality if and only if (S *, Z) = 0. That is because (S *, Z) > 0 (because 
S*, Z >z 0) and ( S *, Z*) = 0 (because Z* is a sum of matrices of the form and S*fk = 0 
for all k.) 

Thus, ( A , H) — if(l, H) — A*(J, H) < 0. Therefore, Z* is a solution to SDP (16). If Z is a feasible 
solution for the SDP (15), (as Z* is), then (I ,H) = (J ,H) = 0, so we conclude that ( A,H) < 0, so 
Z* is also a solution to SDP (15). 

To prove that Z* is the unique solution, restrict attention to the case that Z is another solution 
to either one of the SDPs. We need to show Z = Z*. Since both Z and Z* are solutions, (A,H) — 
r]*(I,H) — A*(J ,H) < 0, so that (D*,H) = (B*,H) = (S*,H) = 0. Therefore, by the above three 
points: Za = 1 for all inliers i, and ( B *, Z) = (S *, Z) = 0. 

Since B*- > 0 whenever i and j are in distinct clusters, and Z > 0 and B* > 0, the condition 
(B*,Z) = 0 implies that Z tJ = 0 whenever i and j are in distinct clusters. By assumption, 
is an eigenvector of S* with corresponding eigenvalue zero, for 1 < k < r. Since A r +i(5) > 0, it 
follows that all the other eigenvalues of S* are strictly positive. The condition (S *, Z) = 0 thus 
implies that all the other eigenvectors of S* are in the null space of Z, so the eigenvectors of Z 
corresponding to the positive eigenvalues of Z must be in the span of It follows that Z 

is a linear combination of matrices of the form ££(££,) T , for k,k' € [r]. It follows that Z^j = 0 if 
either i or j is an outlier vertex, or both are outlier vertices. Moreover, whenever i and j are in the 
same cluster, Z,j = Za = 1. In conclusion, Z = Z*. □ 

Proof of Theorem 8. For A: € [r], denote by Ck C [n] the support of the k th cluster. Also, let Co 
denote the set of outlier vertices. For a set T of vertices, let e(i,T) = Y^jcT^ij and e(T',T) = 
YlieT' e (kT)- Let k(i) denote the index of the cluster containing vertex i. Denote the number of 
neighbors of i in its own cluster by s* = e{i,Cuj\) and the maximum number of neighbors of i in 
other clusters by r, = max^^^o e(?', Cv). 

Now, let us construct (D*, B*, rf, A*) such that the conditions of Lemma 10 hold with high 
probability. Notice that d* = 0 if * is an outlier and Bf, kXCk = 0 for A: € [r]. In order that 
(5*^)j = 0 for i € Ck and k € [r], we must choose: 


Si~rf - X*K k i € Ck, k € [r] 
0 i is an outlier 


The condition S*C = 0 for k € [r] also partially constrains the symmetric matrix B*. We should 
try to be economical in the choice of B* so that we have a chance to prove that S* P 0. 
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Let 


B C k xC k ,(kj) 


' X* 




e(i,C k ,) e(j,C k ) , e(C k ,C k ,) 

Ky K k -T K k Ky 

y _ e(i,Cy) 

Ky 

_ e{j,C k ) 

Kk 

0 


k 7^ k', k, k' € [r] 
k = 0, k' € [r] 

A' = 0, & € [r] 
k = A' € {0,..., r} 


Then B = B T , = 0 for A: € [r], and B*-Z*- = 0. It remains to show d* > 0, B*- > 0 whenever 

i and j are in distinct clusters, and S' ^ 0 for some choice of A* and rf. Let E r = span{£i,... , £ r }. 
We need to show x T S*x > 0 for x € M n with x T E r . A nice thing about the choice of B* (and it 
uniquely determined the choice of B*) is that, for x -L E r , 


x t B*x= Y B C h xCy(hj) x i x i = °> 

k,k’e{0,...,r},k^k' i£C k j&C k , 


where we used the fact that for each pair of distinct k and k' , each of the terms in the definition of 
B*c k xC i (A j) * s either constant in i or constant in j, or both, and if k = 0 the terms are constant 
in j and if k' = 0 the terms are constant in i. The needed condition d* > 0 involves getting a 
lower bound on the number of edges a vertex i has to other vertices in its own cluster (we can 
concentrate on the smallest cluster for that purpose), while the needed condition B > 0 involves 
an upper bound on the number of edges between a vertex i in one cluster and the vertices of a 
different cluster. 

Let us next examine conditions to ensure A r+ i(5*) > 0. We use E[A] = (p — q)Z* — pi; — ql 0 + qj 
where I,; + I Q is a decomposition of the identity matrix for inlier vertices and outlier vertices. For 
any x T E r . we have x T B*x = x T Z*x = 0. Therefore, for any x T E r , and taking 77 * = ||A — E[A]||, 
we have 


x T S*x = x T D*x + (A* — q)x T Jx +p ^ xj + q xj + rf — x 1 (A — E[A])x 

ieCiU-'-uCr ieC 0 

> x t D*x + (A* — q)x T Jx + p Y, X 1 + Q rf 

«eCiU---uC r ieC 0 

= x T D*x + (A* - q) ( Y x i] +P rf + dY^' 

VieCo / iGCiU-uCr ieC 0 

where ^0 is the indicator function for the set of outlier vertices and we used the fact that J = 11 T 
and 1 = (1 — £ 0 ) + Co- From this it is clear that if A* > q, then A r+ i(5'*) > 0. So we will be sure to 
select A* > q. In fact, that will be needed to ensure that B, L j > 0 for all i,j. 

It remains to select A* so that di > 0 and B, L j > 0 for all i,j with high probability. Let 
A* = rlog n/n with f = 6 +V’l + V* 2 , where and 1^2 satisfy the assumptions (18)-(21). Then, for 
inlier vertex i E Ck, in view of Lemma 1 and the definition of /(-,-) in (17), 

| < n “^ / ( a ’ :?r )+ 0 ( 1 ) < n -Pr/(a,r)+o(l)_ 

Applying the union bound yields that with probability at least 1 — ra 1 -Pr-^(a,r)+o( 1 ) ^ £ or 1 < k < r, 
minjgCf, Si > A *Kf. + log n/ log log n. The matrix concentration inequality given in [24, Theorem 5] 
shows that 77* = \\A — E [A] || = 0(y/ log n) with high probability. Therefore, by the assumption 
p r I(a,r) > 1 and the definition of d*, it follows that with high probability, min,;^g; 0 d* > 0. 
Turning next to 7A*-’s for (i,j) € Ck x Cy, it suffices to consider the two following cases: 


Si < X* Kk + 


log n 
log log 
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Case 1: k and k! correspond to the smallest and second smallest clusters, r — 1 and r. 

Note that ver y c l° se to q with high probability, so we can replace it by q = h n , 

which is also the mean of ( ^K k ! ^ an d . Specifically, it follows from the Chernoff bound that 


P 


e(Cfc, C k f) < _ 2 yjq log n 


K k K k ’ 


y/KkKk’ 


= P {e(C k , C fc /) < p(l - e)} < e 


-V 2 = n -2 


where p = qK k K k i and e = 2 %/iogn ^ j n v j ew Q f Lemma 1 and the union bound, 

yJqK k I< k , 


log n 

max e(i, C r ) > (6 + ibi)K r logn/n — -—-- 

i&C r -1 log log n 

log n 

maxe(i, Cr-i) > (6 + ^ 2)^-1 logn/n- 

«eC r log log n 


< n 1 _Pr_ i / ( 6 > b +V , 2 )+o(i)_ 


By the assumptions p r I(b , 6 + ?/>i) > 1 and p r -\I(b , 6 + ^ 2 ) > 1 , it follows that with high probability 
B C k xC k , > °- 


Case 2: A: corresponds to the smallest cluster and C k i is the set of outliers (i.e. k = 

r, k! = 0.) In view of Lemma 1 and the union bound, 


maxe(i,CV) > rK r logn/n 
ieCo 


< 77, 1_ ^ / ( b ’ T ) +0 ( 1 ) 


By the assumptions p r I(b,T ) > 1, it follows that with high probability Bf,^ xC ^ > 0. 

In conclusion, we have constructed (D*, B*,p*, X*) such that the conditions of Lemma 10 hold 
with high probability. Therefore, the theorem follows by applying Lemma 10. □ 

Lemma 11. Let r = lo ° for 0 < a < b. Then I(b, r) < {y/a — Vb) 2 < 2/(6, r). 

Proof. Notice that I(a,x) + I{b,x ) is strictly convex in x. By setting the derivative to be zero, we 
find that it achieves its minimum value, {y/a — Vb) 2 , at x = Vab. By definition, I (a, T ) = I(h,r) 
and yfab < r. Thus, (yfa — Vb) 2 < I (a, t) + /(6, t) = 2/(6, r). Moreover, I(a,x) is decreasing for 
x < a and I(b,x) is non-negative. Therefore, I{b, r) = /(a,r) < I(a,V~ab) < {y/a — Vb) 2 . □ 

Lemma 12. For any p > 0 and x > 0, I(p, p + 2x) < 4 1(p, p + x). 

Proof. Let f(x) = I(p,p + x) for x > 0. Then /(0) = /'(0) = 0 and /"(s) = Therefore, 


/•^ ry r% ry 1 rx rx -1 r 

f(x) = / f"(s)dsdy = / —-—dsdy = / ——dyds = 

Jo Jo Jo Jo P ^ s Jo Js P + s do 


x — s 


0 Z 2 + s 


ds. 


Thus, using a change of variables s = 2 1, 



——-ds = 4 [ 

P + s J 0 


x — t , 

-— dt 

p T 2f 


Comparing the expressions for /(x) and /(2x) completes the proof. 


□ 
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A Behavior of threshold function in (4) 


Recall r](p, a, b ) defined in (4) which governs the sharp recovery threshold for the asymmetric binary 
SBM. The following lemma implies p(p, a , b ) is minimized at p = 1/2. 

Lemma 13. For any a > b > 0, p(p, a, 6 ) is convex in p over [0,1], and symmetric about p = 1/2. 
Proof. Recall that 

(7 + (1 - 2 p)r)p 


t >\ a + b ( 1-2 p)r 

V(P, a, b) = — -7 H-x-log 


(7- (1 - 2 p)r)(l - p) 


and from this expression it is easily checked that p is symmetric about p = 1 / 2 . 

Let rf , rf denote the first-order and second-order derivative of p with respect to p, respectively. 
We show that rf > 0. Recall that 7 = \/(1 — 2p) 2 r 2 + 4p(l — p)ab. Hence, 

d 7 = -4(1 - 2p)r 2 + 4(1 - 2 p)ab = _ ab - r 2 

dp 27 7 


Let h(p) = log ( 7 [ 7 ( t- 2 p ^) ) ( r 1 _p) and then 


1 d7/dp — 2 t 


d7/dp + 2r 


It follows that 


dh _ 1 

dp p l-p ' 7 + (1 - 2/o)r)p 7 -(l- 2 p)r )/3 

1 d 7 2(1 — 2 p)r 4 t7 

p(l — p) dp 4p(l — p)ab 4p(l — p)ab 
7 — r 

pif- P)l 


, d 7 (1 — 2 p)r dh 

”=-f P + —f L T p - Th 


= - 2(1 - 2 p) 


ab — t 2 (1 — 2 p)r 7 —r 


7 

(a) (1 — 2 p)(r — 7) 


+ 


— r/i. 


p(l - p)7 


— rh 


2 p(l - P) 

where (a) follows using the expression of 7 . Therefore, 


If - 1 

2 Vp : 


7 = ( “2 + 


1 


(1-P ) 2 
P 2 + (1 - P ) 2 


( T _ 7 )_ 5 * 2-3 

2 p(l - p) dp dp 


2 p 2 (l — p ) 2 


^ ^ , (1 - 2 p) 2 (r 2 - ab) r(r - 7 ) 

(r - 7) H ----- h 


p(l - p)7 


P( 1 - P)7 


1 


p 2 + (l-p ) 2 2 /p 2 + (l-p) 


P(l-P)7 L 2 p( 1 ~p) 


-7 


p(l - p)7 

ab 

1 


2 p 2 (l — p ) 2 7 

T 2 



7 r 


+ 


2 p(l - p) 

p 2 + (l-p ) 2 


+ 1 ) 7T + (1 + (1 — 2 p) 2 ) r 2 


2p 2 (l — p) 2 7 


2 p(l - P) 2 p(l - p) 

[(p 2 + (1 - p) 2 ) r 2 - 7 r + 2 p(l - p)a 6 ] 

p 2 + (1 — p ) 2 - y/(l — 2p ) 2 + 4p(l — p)ab/r 2 + 2p(l — p)ab/i 


> 0 , 
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where the last inequality follows because by letting x = ab/r 2 , 

(/P 2 + (1 - P? + 2p(l - P)xf - [(1 - 2p) 2 + 4p(l - p)x] = V(1 - p)\x - l) 2 > 0. 

Thus rj is convex in p. □ 


B A data-driven choice of the penalization parameter in (5) 


Fix a > b such that y/a > Vb + ypl and fix p £ (0, |]. Recall that p = alogn/n, q = blogn/n, 
K = |"pn], and p = 1 — p. Let dj = Aij denote the degree of the i th vertex and set uq = . 

Set u>_ = ap + pb and w + = ap + pb. Then E[uq] = u>_ + 0(l/n) or w + + 0(l/n) if = 1 
or -1, and io+ > u;_ with equality if and only if p = Let w = ^ w i = niogn zCkj Ap 
where Ylicj Aj is distributed as Binom(( 9 ) + ( n “ ),p) convolved with Binom(A”(n — K),q). It 
follows from Bernstein’s inequality that for any c > 0, there exists a constant c' > 0 such that with 
probability at least 1 — n~ c , 


Vl-V-TIT,,]) 

i<j 


< dy/nlogn. 


Thus w = pw- + pw+ + Op(n 1 / 2 ). 

Set P = «4- = and w- = ^Yl w i 1 {w i <w}, which are consistent 

estimates for p,w+,w-, respectively. From these we can readily obtain consistent estimates for 
( a,b,p ) whenever p 7^ 1/2. Furthermore, when p = 1/2, we claim that 

P = 7 } + o P (log _1/9 n). (74) 


Now we are ready to choose the penalty parameter A = A(j4), so that Theorem 3 continues to hold 
upon replacing the deterministic A* by A. Let 


A 


n 
—w_ 


1-2 p 


1 _log re 

(id_|_ +'u;_ )(1 — 2p) — (tS_)_ — w _ ) 


|p — 11 < log n 
\p — \\> log^ 1 / 9 n. 


(75) 


To verify the correctness of the SDP, it suffices to show that A is close to the appropriate deter¬ 
ministic penalty term in probability. First consider p = \. Then (74) implies that A = = 

(a^6 _|_ 0p (i))l2iA_ The proof of [24, Theorem 2] for the binary symmetric SBM shows that any 
A > q suffices. Next consider p d Then p £ (e, 7 — e) for some e > 0. Since p is a consistent 
estimator of p when p / 1/2, it follows that A is set according to the second case of (75). Recall 
that A* = and r = Iog a-logb ■ Then A = (r + op(l))^|p and the proof of Theorem 3 carries 

over. 

It remains to prove (74). To this end, note that with high probability, w = + Op{n~ l t 2 ). 

Set 5 n = n~ 1/3 and define p± = 1 <(«+*>)/ 2 ±M' Since p_ < p < p + with probability tending 

to one, it suffices to show both p± satisfy (74). We only consider p_ as the other case follows 
entirely analogously. Define X = (wq — E [w\}) / y/ \iar{wi). Denote by Fx and $ the cumulative 
distribution function of w\ and the standard normal distribution, respectively. It follows from 
Berry-Esseen inequality that 

sup | F x {x) - 4>(x)| = 0(1/\/log n). 
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Since E[uq] = + 0(^) and var(u>i) = 0(1/log n), we have 

E [p_ ] = P j oq < - <5n j = F x (o(S n y/lognf) 


= $ 


(0( v / b^/n 1/3 )) + 0 (1/y/log n) = 1/2 + 0(1/ y/logn). 


( 76 ) 


Moreover, by definition 


var(/5_) = ip <! w\ < 
n 


a + b 


— 5 n > + 


n(n — 1) 


n- 


Wl,W 2 < 


a + b 


- 5 n } - P 2 wi < 


a + b 


-S ri 


(77) 


Let d\ = 2 7^1 j an( i ^2 = Ylj^i ^ 2 j- Let Wi = dj/logn for i = 1, 2. Then 

a + b _ 1 a + 6 _ 1 (_ a + b 

W\ ,W 2 < —- S n / < P < Wi,W2 < — - Sn > = P 2 \ Wl < —- S r 


(78) 


where the last equality holds because wd and v +2 are independent and identically distributed. Sim¬ 
ilarly to (76), we have P{u;i < — 5 n } = 1/2 + 0(l/y/logn). Therefore, in view of (77) and 

(78), we have that var(/L) = 0(l/\/log n). By Chebyshev’s inequality, with probability at least 
1 — log -1 / 4 n, \p- — E [p_ ] | < ©(log -1 / 8 n), completing the proof. 
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